AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims: 

1-3. (canceled) 

4. (currently amended) The fact extraction tool set of claim ^7, wherein the 
attributes include tokenization, text normalization, part of speech tags, sentence boundaries, 
parse trees, and semantic attribute tagging. 

5. (currently amended) The fact extraction tool set of claim 4-7, wherein the means 
for annotating the text comprises a plurality of independent annotators , wherein each of the 
annotators has at least one specific annotation function, and wherein the fact extraction tool set 
further comprises user-implemented means for specifying which of the annotators to use and the 
order of their use. 



6. (canceled) 



7. (currently amended) The fact e xtraction tool set of claim 6, furthor comprising A 
fact extraction tool set for extracting information from a document, wherein the document 
includes text, comprising: 

means for breaking the text into tokens: 

means for annotating the text with token attributes, constituent attributes, links> and tree- 
based attributes, using XML as a basis for representing the annotated text and for resolving 
conflicting annotation boundaries in the annotated text to produce well-formed XML from the 
r e sults of indep e nd e nt annotators : and 

means for extracting facts from the annotated text using text pattern recognition rules, 
wherein each text pattern recognition rule comprises a pattern that describes text of interest, a 
label that names the pattern for testing and debugging purposes, and an action that indicates W'hat 
should be done in response to a matching of the pattern, and wherein the text pattern recognition 
rules use regular expression-based functionality, tree-based functionality, and user-defined 
matching functions . 
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8. (currently amended) The fact extraction tool set of claim ^2, wherein the means 
for br e aking the t e xt into its bas e tok e ns and annotating the bas e tok e ns and patt e rns of bas e 
tokens comprises independent annotators, wherein the annotators are of three types compri s ing : 

the token attributesr-^vvhieh have a one-per-base-token alignment, where for the 
attribute type represented, there is an attempt to assign an attribute to each base token; 

the constituent attributes are assigned yes-no values, where the entire pattern of 
each base token is considered to be a single constituent with respect to some annotation value; 
and 

the links , which assign common identifiers to coreferring and other related 
patterns of base tokens. 

9. (canceled) 

10. (canceled) 

1 1 . (currently amended) The fact extraction tool set of claim Wi2, wherein the means 
for identifying and extracting potentially interesting pieces of information comprises means for 
performs the further function of recognizing both true left and right constituent attributes and 
non-contiguous constituent attributes. 



1 2. {currently amended) Th e fact e xtraction tool sot of claim lO. A fact extraction tool 
set for extracting information from a document, wherein the document includes text, comprising: 
means for breaking the text into tokens: 

means for annotating the text with token attributes, constituent attributes> links, and tree- 
based attributes, using XML as a basis for representing the annotated text: 

means for associating all annotations assigned to a particular piece of text with the base 
tokens for that text to generate aligned annotations: and 

wh e rein the means for identifying and extracting potentially interesting pieces of 
informatio n compris e s at l e ast on e text pattern r e cognition rul e in the aligned annotations by 
finding patterns in the attributes of the annotated text using text pattern recognition rules written 
in a rule-based infonnation extraction language, wherein each text pattern recognition rule 
comprises a pattern that describes text of interest, a label that names the pattern for testing and 
debugging purposes, and an action that indicates what should be done in response to a matching 
of the pattern, and wherein the text pattern recognition rules use regular expression-based 
functionality, tree-based functionality, and user-defined matching functions , and wherein th e at 
l e ast on e e ach text pattern recognition rule queries for at least one of literal text, attributes, and 
relationships found in the aligned annotations to define the facts to be extracted. 



13-14. (canceled) 



15. {currently amended) The fact extraction tool set of claim 12, wherein the means 
for id e ntifying and e xtracting pot e ntially int e r e sting pi e c e s of information further compris e s at 
least on e auxiliary d e finition stat e m e nt user-defined matching functions are used to name and 
define a fragment of a pattern. 

16. (currently amended) A rule-based information extraction language for use in 
identifying and extracting potentially interesting pieces of information in aligned annotations in a 
text, the language comprising at l e ast on e a plurality of text pattern recognition ful erules that 
queri e s querv for at least one of literal text, attributes, and relationships found in the aligned 
annotations to define the-facts to be extracted, using r e gular expression functionality, tr ee bas e d 
functionality, and auxiliary definitions in any combination wherein each of the text pattem 
recognition rule comprises: 

a pattem that describes text of interest: 

a label that names the pattem for testing and debugging purposes: and 
an action that indicates what should be done in response to a matching of the pattern: and 
wherein the text pattem recognition mles use regular expression-based functionality, tree- 
based functionality, and user-detlned matching functions . 

17-18. (canceled) 

19. (currently amended) The language of claim 16, furth e r comprising at l e ast one 
auxiliary d e finition stat e m e nt wherein the user-defined matching functions are used to name and 
define a fragment of a pattem. 
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20. {currently amended) A text annotation tool comprising: 
means for breaking text into its base tokens; 

fB^afts -a plurality of independent annotators for annotating the bas e tokens and patterns of 
base tok e ns with the text with token attributes, constituent attributes, links, and tree-based 
attributes , using XML as a basis for representing the annotated text, wherein each of the 
annotators has at least one specific annotation function : 

user-implemented means for specifying which of the annotators to use and the order of 
their use: m d 

means for associating all annotations assigned to a particular piece of text with the base 
tokens for that particular piece of text to generate aligned annotations ; and 

means for resolving conflicting annotation boundaries in the annotated text to produce 
well-fomied XML . 

2L (previously presented) The text annotation tool of claim 20, wherein the attributes 
include tokenization, text normalization, part of speech tags, sentence boundaries, parse trees, 
and semantic attribute tagging. 



22-24. (canceled) 



25. (currently amended) The text annotation tool of claim 20, wherein- 
annotating th e bas e tok e ns and patterns of bas e tok e ns compriG e s indep e nd e nt annotators, 
wh e rein th e annotators are of thr ee typ e s comprising : 

the token attributes , which have a one-per-base-token alignment, where for the 
attribute type represented, there is an attempt to assign an attribute to each base token; 

the constituent attributes are assigned yes-no values, where the entire pattern of 
each base token is considered to be a single constituent with respect to some annotation value; 
and 

where the linksr^wMeh assign common identifiers to coreferring and other related 
patterns of base tokens. 



26-40. (canceled) 
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41. {currently amended) A method of extracting information from a document, 
wherein the document includes text, comprising the steps of: 

breaking the text into base tokens: 

annotating the text with regular, e xpr e ssion bas e d attributes and with tree based 
attribut e s: and token attributes, constituent attributes, links, and tree-based attributes, using XML 
as a basis for representing the annotated text: 

resolving conflicting annotation boundaries in the annotated text to produce well-formed 
XML: and 

extracting facts from the annotated text using text p attern recognition rules wTitten in 
rule-based infomiation extraction language, using r e gular, expression bas e d functionality, tr ee 
based functionality, and auxiliaiy d e finitions in any combination wherein each text pattern 
recognition rule comprises a pattern that describes text of interest, a label that names the pattern 
for testing and debugging purposes, and an action that indicates what should be done in response 
to a matching of the pattern, and wherein the text pattern recognition rules use regular 
expression-based functionality, tree-based functionality, and user-defined matching functions . 

42. (canceled) 

43. (currently amended) The method of claim 41, wherein the parsing of the text 
compris e s br e aking it into its bas e tok e ns and annotating th e bas e tokens and patterns of bas e 
tok e ns with a numb e r o fi n the annotating step, the attributes include orthographic, syntactic, 
semantic, pragmatic and dictionary-based attributes. 
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44. {canceled) 

45. {currently amended) The method of claim 41, wherein the parsing of the t e xt 
annotating step is carried out b y a plurality of independent annotators , wherein each of the 
annotators has at least one specific annotation function, and wherein the method further 
comprises the step of allowing a user to specify which of the annotators to use and the order of 
their use . 

46-47. {canceled) 

48. {currently amended) The method of claim [[43]]41, wherein the st e p of br e aking 
the t e xt into its bas e tok e ns and annotating th e base tokens and patt e rns of bas e tok e ns is carri e d 
out using ind e p e ndent annotators, wh e r e in th e annotators ar e of thr ee types comprising : 

the token attributes , which have a one-per-base-token alignment, where for the 
attribute type represented, there is an attempt to assign an attribute to each base token; 

the constituent attributes are assigned yes-no values, where the entire pattern of 
each base token is considered to be a single constituent with respect to some annotation value; 
and 

the links , which assign common identifiers to coreferring and other related 
patterns of base tokens. 
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49. (currently amended) The method of claim [[43]]41 , wherein the step-ef annotating 
step a t e xt further oomprisos th e st e p of includes associating all annotations assigned to a 
particular piece of text with the base tokens for that text to generate aligned annotations. 

50. (canceled) 

51. (currently amended) The method of claim [[50]]41, wherein the step of 
id e ntifying and e xtracting pot e ntially inter e sting pi e ces of information comprises r e cognizing 
text pattern recognition rules have the ability to recognize b oth true left and right constituent 
attributes and non-contiguous constituent attributes. 

52. (currently amended) The method of claim [[50]]41, wh e r e in th e patt e rns are 
found using at le ast on e t e xt pattern recognition rule written in a rul e bas e d information 
e xtraction languag e , w herein the at least on e t ext pattern recognition rul e qu e ries rules query for 
at least one of literal text, attributes, and relationships found in the aligned annotations to define 
the facts to be extracted. 

53-55. (canceled) 

56. (new) The fact extraction tool set of claim 7, wherein the pattern recognition rules 
query for at least one of literal text, attributes, and relationships found in the annotated text to 
define facts to be extracted. 
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57. (new) The fact extraction tool set of claim 7, wherein the means for annotating the 
text represents the annotated text as a single view of the document expressed as inline XML. 

58. (new) The fact extraction tool set of claim 12, wherein the means for annotating 
the text represents the annotated text as a single view of the document expressed as inline XML. 

59. (new) The method of claim 41, wherein in the annotating step, the annotated text 
is represented as a single view of the document expressed as inline XML. 

60. (new) The fact extraction tool set of claim 7, wherein the means for extracting 
uses XPath for traversing XML-based tree representations in the annotated text. 

61. (new) The fact extraction tool set of claim 12, wherein the means for extracting 
uses XPath for traversing XML-based tree representations in the annotated text, 

62. (new) The method of claim 41, wherein in the extracting step, XPath is used for 
traversing XML-based tree representations in the annotated text. 
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